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Abstract — Compressed sensing allows perfect recovery 
of sparse signals (or signals sparse in some basis) using 
only a small number of random measurements. Existing 
results in compressed sensing literature have focused on 
characterizing the achievable performance by bounding the 
number of samples required for a given level of signal 
sparsity. However, using these bounds to minimize the 
number of samples requires a-priori knowledge of the 
sparsity of the unknown signal, or the decay structure for 
near-sparse signals. Furthermore, there are some popular 
recovery methods for which no such bounds are known. 

In this paper, we investigate an alternative scenario 
where observations are available in sequence. For any 
recovery method, this means that there is now a sequence 
of candidate reconstructions. We propose a method to 
estimate the reconstruction error directly from the samples 
themselves, for every candidate in this sequence. This 
estimate is universal in the sense that it is based only on the 
measurement ensemble, and not on the recovery method or 
any assumed level of sparsity of the unknown signal. With 
these estimates, one can now stop observations as soon as 
there is reasonable certainty of either exact or sufficiently 
accurate reconstruction. They also provide a way to obtain 
"run-time" guarantees for recovery methods that otherwise 
lack a-priori performance bounds. 

We investigate both continuous (e.g. Gaussian) and 
discrete (e.g. Bernoulli) random measurement ensembles, 
both for exactly sparse and general near-sparse signals, 
and with both noisy and noiseless measurements. 

Index Terms — Compressed sensing, sequential measure- 
ments, stopping rule. 



I. Introduction 

In compressed sensing (CS) ||T|, ||2l a few random 
linear measurements of a signal are taken, and the signal 
is recovered using the additional knowledge that either 
the signal or some linear transform of it is sparse. 
These ideas have generated a lot of excitement in the 
signal processing and machine learning communities, 
and have been applied to a range of applications such as 
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magnetic resonance imaging (MRI) [3|, computational 
photography [4], wireless networks |5|, and structure 
discovery in biological networks |6|. 

The applications where compressed sensing is most 
beneficial (e.g. MRI) have a high cost of acquiring each 
additional sample. If this cost (in terms of time, power, 
e.t.c) is high as compared to the cost of computation, 
then it is suitable to use sophisticated recovery algo- 
rithms which include the £i-based basis pursuit 17], 
greedy approaches |i8], and even non-convex (£p) or 
iterative formulations ||9l-| fTn to enable recovery from 
fewer measurements. 

While some of the recovery methods, especially those 
based on ^i-regularization, have analytically provable 
performance guarantees JJ), lfT2l . others, such as non- 
convex £p, reweighted ii ifTTl . and sparse Bayesian 
learning (SBL) lfT3l do not, and they have been shown 
empirically to often require even fewer samples than £i- 
based methods. Furthermore, when guarantees do exist, 
they have been empirically observed to sometimes be 
highly pessimistic and may require large dimensions to 
hold with high probability fT^, f\A\. Another drawback 
is that much of the existing analysis characterizes how 
many measurements are needed for a signal with a given 
sparsity level. However, as the sparsity level is often not 
known a-priori, it can be very challenging to use these 
results in practical settings. 

In this paper we take an alternative approach and 
we develop estimates and bounds for the reconstruction 
error using only the observations, without any a-priori 
assumptions on signal sparsity, or on the reconstruction 
method. We consider a scenario where one is able to get 
observations in sequence, and perform computations in 
between observations to decide whether enough samples 
have been obtained - thus allowing to recover the signal 
either exactly or to a given tolerance from the smallest 
possible number of random observations. This, however, 
requires a computationally efficient approach to detect 
exactly when enough samples have been received. To get 
an intuition behind our approach - suppose that we first 
attempt to reconstruct the signal while withholding some 
available observations, akin to cross-validation. The ob- 
servations correspond to a known linear function of the 
true signal, so if the reconstructed signal is quite different 
from the true signal, then the same linear function 



applied to our recovered signal will result in a value that 
is far from the actual observation, with high probability. 
Our results provide estimates of the reconstruction error 
based on the statistics of the measurement model. They 
can thus be used to provide 'run-time' guarantees even 
for decoders that are otherwise not amenable to analysis. 

We first consider the case when noiseless measure- 
ments are taken using the random Gaussian (or generic 
continuous) ensemble, and we show that simply checking 
for one-step agreement provides a way to check exactly 
when enough samples have been received. Suppose that 
after receiving M samples yi — a^x, i — 
we apply a sparse reconstruction method of our choice, 
and obtain a solution x^^ satisfying all the M measure- 
ments. We can use any sparse decoder, including greedy 
matching pursuit, SBL, Ip formulations, and even the 
brute-force decoder, but we require that the solution at 
each step M satisfies yi = ajx^^, for i = 1,..,M. For 
example, in the case of basis pursuit, we would solve 

x*^ = argmin ||x||i s.t. ELi:x. = yi, i~l,..,M. (1) 

Next, we receive one more measurement, and check 
for one step agreement: i.e. if x*^+^ = x*^, then the 
decoder declares x^^ to be the reconstruction and stops 
requesting new measurements. In Section |III] we show 
in Propositions [T] and |2] that this decoder gives exact 
reconstruction with probability one. 

For some other measurement ensembles, such as ran- 
dom Bernoulli and the ensemble of random rows from 
a Fourier basis, the one-step agreement stopping rule no 
longer has zero probability of error. We modify the rule 
to wait until T subsequent solutions x^^, x*^+^ all 
agree. In Section HVl we show in Proposition[3]that in the 
Bernoulli case the probability of making an error using 
this stopping rule decays exponentially with T, allowing 
trade-off of error probability and delay. 

In Sections |V] and |VT] we show how the error in 
reconstruction can be estimated from the sequence of 
recovered solutions. We first present analysis for the 
Gaussian measurement ensemble in Proposition |4] and 
then generalize to any sensing matrices with i.i.d. entries. 
This enables the decoder to stop once the error is below a 
required tolerance - even for signals that are not exactly 
sparse, but in which the energy is largely concentrated 
in a few components, or for measurements which are 
corrupted by noise. 

Finally, in Section IVIII we motivate the need for 
efficient solvers in the sequential setting. We consider 
the basis pursuit sparse solver and show that rather than 
re-solving the problem from scratch after an additional 
measurement is received, we could use an augmented 
linear program that uses the solution at step M to guide 



its search for the new solution. We show empirically 
that this approach significantly reduces computational 
complexity. 

During the review process we learned about a very 
recent analysis in (TFl for the cross-validation setting, 
using the Johnson-Lindenstrauss lemma. We describe 
similarities and differences from our work in the discus- 
sion in Section [V] Our current paper extends our earlier 
results presented in l,16J . 

II. Brief overview of compressed sensing 

As there is no dearth of excellent tutorials on com- 
pressed sensing jT], ||2l, fj/TJ, in this section we give 
only a brief outline mainly to set the stage for the rest 
of the paper At the heart of compressed sensing lies 
the sparse recovery problerrQ. which tries to reconstruct 
an unknown sparse signal x from a limited number of 
measurements y = Ax, where A e R^^^^, M « N. 
Much of excitement in the field stems from the fact 
that the hard combinatorial problem of searching for 
sparse solutions in the affine space {x : y = Ax} 
under certain suitable conditions can be solved exactly 
via various tractable methods. The most widely known 
methods include greedy matching pursuit and its variants 
|8 1, and approaches based on convex optimization, using 
£i norms as a proxy for sparsity Q: 

min||x||i subject to y = Ax. (2) 

An early sufficient condition for sparse recovery [18] 
states that the formulation in (|2|i recovers the unique 
sparse solution if A is well-posed and x is sparse enough, 
i.e. if ||x||o < i+l^ii^^ where M{A) = maxi^j |a^aj|, 
and A has columns a^ normalized to 1. However, 
this simple condition is very pessimistic. Much tighter 
conditions are obtained by considering larger subsets 
of columns of A, e.g. the restricted isometry property 
(RIP) depends on the maximum and minimum singular 
values over all M y. K submatrices of A llT2l . Namely, 
a matrix A satisfies the ii'-RIP with constant 5k if 
(1 - 5k)MI < < (1 + Sk)H\1 for every x 

which has at most K non-zero entries. WTiile enabling 
much tighter sufficient conditions for recovery of sparse 
signals |12J, the RIP is very costly (exponential in K) 
to check for a given matrix. 

Results in compressed sensing take advantage of RIP 
by bringing in the theory of random matrices into 
the picture. In compressed sensing we receive random 

' The ground-breaking results |18| predating compressed sensing 
were in context of sparse signal representation where one seeks to 
represent a vector y in an overcomplete dictionary A S M*^^^, 
M << N, with coefficients x, i.e., y = Ax. 
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Fig. 1 . Histogram of tlie stopping times distiibution for Gaussian and 
Bernoulli measurement ensembles: TV = 100, and K = 10, and l\ 
decoding. 



measurements y = ^Ps where the unknown signal of 
interest s is itself sparse in some basis, i.e. s = $x, with 
X sparse. Hence the problem reduces to finding sparse 
solutions satisfying y = 4'<l'x = Ax, where A = 'i'^ is 
a random matrix. 

A collection of results have been established that RIP 
holds for random matrices of certain size from given 
ensembles: Gaussian, Bernoulli, random Fourier rows 
|l2J, 1 12 1, [14|. The general conclusion of these results 
is that the convex £i formulation can recover (with high 
probability) a sig nal X e with K non-zeros from 
only CK\og{N) measurements, where C is a constant 
depending on the random measurement ensemble. This 
is indeed remarkable - as it only requires a logarithmic 
dependence of the number of measurements on N. 

However, when each additional measurement is very 
costly there are several problems with these bounds - 
firstly, since they are high-probability results independent 
of y, they tend to be conservative, and also the constants 
C are typically generous upper-bounds. Secondly, the 
number of measurements depends on the number of 
non-zero components of x which may not be known 
a-priori. Finally, there are successful approaches which 
we mentioned in Section J] for which no such results are 
available. 

In Figure [T] we illustrate the drawbacks of using upper 
bounds on the number of measurements. We find the 
minimum number M of random samples which were 
needed to recover a sparse signal x with N = 100, 
and K = 10 from random Gaussian and Bernoulli 
measurements using the £i -formulation in over 500 
random trials. We plot a histogram of these numbers, and 



we see that they exhibit high variance, and so relying on 
conditions that guarantee recovery with high probability 
often means taking many unnecessary samples. This 
motivates the need for sequential compressed sensing 
scenario that can adaptively minimize the number of 
samples for each observed y, which we describe next. 

III. Stopping rule in the noiseless continuous 

CASE 

We now analyze the sequential CS approach for the 
case when the measurements vectors a; come from a 
continuous ensemble (e.g., the i.i.d. Gaussian ensemble), 
having the property that with probability 1 a new vector 
aA/+i will not be in any lower-dimensional subspace 
determined by previous vectors {ai}f£^. Suppose that 
the underlying sparse signal x* G has K non- 
zero components (we denote the number of non-zero 
entries in x by ||x||o). We sequentially receive ran- 
dom measurements — a^x*, where for concreteness 
aj ~ A/'(0, /) is a iV-vector of i.i.d. Gaussian samples, 
but the analysis also holds if entries of a, are i.i.d. 
samples of an arbitrary continuous random variable. At 
step M we use a sparse solver of our choice to obtain a 
feasibl^ solution x*^ using all the received data. Results 
in compressed sensing |[T], lfT4l indicate that if we use 
basis pursuit or matching pursuit methods, then after 
receiving around M cx K log(A^) measurements we can 
recover the signal x* with high probability. This requires 
the knowledge of K, which may not be available, and 
only rough bounds on the scaling constants are known. 
Our approach is different - we compare the solutions at 
step M and A/ + 1, and if they agree, we declare correct 
recovery. 

Proposition 1: If in the Gaussian (generic continuous) 
measurement ensemble it holds that x*^+^ = x*^, then 
_ jj.*^ probability 1. 

Proof. Let yi.,M = [yi, yA/]', and 
A^' = [ai,...,aM]'. Suppose that x*^ ^ x*. We 
have that yi:M = A*^x*^ and yi^M = A^x*: both x* 
and x*^ belong to the (A^ — A/) -dimensional affine space 
{x I yi:7\/ — A^^x}. The next measurement passes a 
random hyperplane yM+i — sl'^.j+i^* through x* and 
reduces the dimension of the affine subspace of feasible 
solutions by 1. In order for x*^ to remain feasible 
at step M + 1, it must hold that yj\/+i — ^m+i^*^- 
Since we also have yAf+i = a-M+i^*, then x^^ 
remains feasible only if (x^^ — x*)'aj\/+i = 0, i.e. 

-This requirement is essential for the noiseless case (it is relaxed in 
later sections). For greedy methods such as matching pursuit this means 
that we allow enough iterations until all the measurements received so 
far are satisfied perfectly. Noiseless basis pursuit formulations satisfy 
it by construction. 



3 




Fig. 2. A new constraint is added: a'^^^j^x = yM+i- Probability 
that this hyperplane passing through x* also passes through x*^ is 
zero. 

if aM+1 falls in the — 1 dimensional subspace of 
con-esponding to Null{{±^'' - x*)'). As Slm+i is 
random and independent of x*^ and of the previous 
samples ai, ajv/, the probability that this happens is 
(event with measure zero). See Figure |2] for illustration. 
□ 

Note that the proof implies that we can simplify 
the decoder to checking whether a'^^^^^j^x*'^ = um+i, 
avoiding the need to solve for x*^+^ at the last stefH. 
Moreover, if using any sparse solver in the continuous 
ensemble case the solution x^"^ has fewer than M non- 
zero entries, then x^^ = x* with probability 1. 

Proposition 2: For a Gaussian (continuous) measure- 
ment ensemble, if ||x^-'^||o < M, then x*^ = x* with 
probability 10 

Proof. Denote the support of our unknown sparse 
vector X* by I, i.e. X = {i \ x* 7^ 0}. We next 
generate a random measurement matrix A^^. Let A = 
A'^^ to simplify notation. We receive the corresponding 
measurements y Ax*. Now A is M x N, with 
M < N. The key fact about random matrices with 
i.i.d. entries from a continuous distribution is that any 
M X M submatrix of A is non-singular with probability 
10. We now argue that with probability 1 after receiving 
y there will not exist another sparse feasible solution 
X ^ X*, i.e. X with fewer than M non-zero entries 
satisfying y = Ax . We consider all possible sparse 
supports J C {1, .., N}, with \ J\ < M, and show that 

'We thank the anonymous reviewer for this simplitication. 

''Note that a random measurement model is essential: for a fixed 
matrix A if 2K > M then there exist xi and X2 such that Axi = 
j4x2 and ||xi||o < K. However, for a fixed x* with ||x*||o < M the 
probability that it will have ambiguous sparse solutions for a random 
choice of A is zero. 

^This is easy to see: fix T C {!,..., A^} with |T| = M. Then 
probability that ^ ^V^-i^i^Ti) ■■■)^Tm-i) zero, as Ay^j- 

is a random vector in M*^ and the remaining columns span a lower- 
dimensional subspace. 
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Fig. 3. Gaussian ensemble example: N = 100, and K = 10. (Top): 
||x"||o. (Middle): l|x"||i. (Bottom): ||x*-x*^||2. 



a feasible solution x 7^ x* can have this support only 
with probability 0. There are two cases: I C ^7 and 

First suppose I <Z J , \ J\ < M, and suppose there 
exists some feasible x supported on J'. Then x — x* e 
NuU{A), and support of x— x* is a subset of J', hence it 
is smaller than AI. But that means that there is a subset of 
fewer than AI columns of A that are linearly dependent, 
which can only happen with probability zero. 

Now consider the case I I D J^. For a fixed I 
we consider all such possible sets J', with < AI. 
First fix one such set J'. We use the notation 
I\J = { i e I \ i ^ J}. Note that we have 
y = Ax* = Ajx*j + Ax\jx^\j. Let y = Ax\jx^\j. 
Now since we require x to be feasible, we also 
need y = Ax = Ajxj which would imply that 
y Aj{xj —x*j). This means that the vector y would 
also have to be in the span of Aj. However, y is a 
random vector in M*^ (determined by x* and Ax\j), 
and span of Aj is an independent random subspace 
of dimension strictly less than AI. Hence, the event 
that y also falls in the span of Aj has measure zero. 
This means that for a fixed J a distinct sparse solution 
can only exist with probability 0. Now the number 
of possible subsets J is finite (albeit large), so even 
when we take all such supports J, a distinct sparse 
solution supported on J can only exist with probability 
0. Hence, with probability 1 there is only one solution 
with ||x||o < M, namely x*. □ 

This proposition allows to stop making measurements 
when a feasible solution has less than M nonzero 
entries - avoiding the need to make the last [M + l)-st 
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measurement. 

Consider an example in Figure |3] with = 100, and 
K — 10. We keep receiving additional measurements 
and solving ([T]l until we reach one-step agreement, 
ic*^ = x*^"*"^. The top plot shows that ||x^-'^||o increases 
linearly with M until one step agreement occurs at 
M — 35, at which point it drops to X = 10 and a and 
we recover the correct sparse solution, x^^ = x*. The 
middle plot shows the monotonic increase in ||x*^|ii (as 
the feasible set is shrinking with M). The bottom plot 
shows the error-norm of the solution, lli*^ -xlU. On 
average it tends to go down with more observations, but 
non-monotonically. After M = 35 the error becomes 
zero. We see that in the ideal conditions of no mea- 
surement noise, sparse unknown signals and Gaussian 
measurement ensembles, the number of measurements 
can be indeed minimized by a simple stopping rule. 

IV. Stopping rule in the Bernoulli case 

In this section we study a simple but popular measure- 
ment ensemble that is not one of the generic continuous 
ensembles described in the previous section. Suppose 
that the measurement vectors have equiprobable 
i.i.d. Bernoulli entries ±1. A difference emerges from 
the Gaussian case: the probability that all AI x M 
submatrices of A^'^ are non-singular is no longer 0. 
This makes it possible (with non-zero probability) for 
-jj.Af+1 agree with x*^ even though x^^ ^ x*, and 
for erroneous solutions x*^ to have cardinality less than 
M. We modify the stopping rule to require agreement 
for several steps - success is declared only when last 
T solutions all agree. We show in proposition [3] that the 
probability of error decays exponentially with T. We use 
the following Lemma from [19] : 

Lemma 1 (Tao and Vu): Let a be an i.i.d. equiproba- 
ble Bernoulli vector with a € {—1, 1}^. Let be a de- 
terministic c?-dimensional subspace of ]R^, < d < A^. 
Then P(a e W) < 2'^-^. 

We are now ready to establish the following claim: 

Proposition 3: Consider the Bernoulli measurement 
case. If x^^ = ic^^+i = ... = x*^+^, then x*^ = x* 
with probability greater than or equal to 1 — 2^^. 

Proof. Suppose that x*^ ^ x*. Denote the support of 
X* by I and the support of x*^ by J'. At step AI we have 
A^^x* = A^-^A^^ Let W ^ {a\ (A*^ - x*)'a = 0}, 
i.e. the nullspace of (x*^ -x*)'. Then W is an (A^- 1)- 
dimensional subspace of . 

Given a new random Bernoulli sample slm+i, the 
vector x*^ can remain feasible at step M + 1 only 
if (x*^ - X*)' a.M+1 = 0, i.e. if slm+i falls into W. 
By Lemma [H the probability that aM+i G T4^ is a 



most 1/2. The same argument applies to all subsequent 
samples of a^z+i for i — 1,..,T, so the probability of 
having T-step agreement with an incorrect solution is 
bounded above by 2^^^. □ 

Note that as in the discussion for the continuous case, 
we can simply check that a'^.^^^x*^ = UM+i for i = 
1, ...,T, avoiding the need to solve for x*^+^. 

We now pursue an alternative heuristic analysis, more 
akin to Proposition |2] For the Bernoulli case, ||x*^||o < 
AI does not imply x*^ — x*. However, we believe that 
once we obtain enough samples so that N'^2^^^'^ <^ 1 
then ||x^^||o < M will imply that x^^ = x* with high 
probability. Since the elements of a^ belong to finite set 
{ — 1,1}, an M X AI submatrix of A^' can be singular 
with non-zero probability. Surprisingly, characterizing 
this probability is a very hard question. It is conjectured 
|fT9l that the dominant source of singularity is the event 
that two columns or two rows are equal or opposite in 
sign. This leads to the following estimate (here Xm is 
M X M)0 

P(det ATm = 0) = (1 + o{l))M^2^-^'' . (3) 

However the very recent best provable bound on this 
probabiHty is still rather far: P(detA:M = 0) = ((| + 
o{l))^) ||T9l . If we assume that the simple estimate 
based on pairs of columns is accurate, similar analysis 
shows that the probability that a random ±1 M x A^ 
matrix with AI ^ A^ having all AI x AI submatrices 
non-singular is (1 + o{l))N^2^-^^ . 

V. Near-sparse signals 

In practical settings, e.g. when taking Fourier and 
wavelet transforms of smooth signals, we may only have 
approximate sparseness: a few values are large, and most 
are very small. In this section we extend our approach to 
this case; again, and in contrast to existing work, we do 
not need to assume a specific near-sparse structure, Uke 
power-law decay, but instead provide bounds that hold 
for any signal. 

The exact one-step agreement stopping rule from Sec- 
tion|lII]is vacuous for near-sparse signals, as ||x* ||o = A^, 
and all samples are needed for perfect recovery. We start 
by considering Gaussian measurements, and show that 
we can gather information about the current reconstruc- 
tion error by obtaining a small number of additional 
measurements, and computing the distance between the 
current reconstruction and the affine space determined 

^Probability that two columns are equal or opposite in sign is 2^~*^, 
and there are O(Af^) pairs of columns. 
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Fig. 4. Geometiy of the analysis for near-sparse signals. The unknown 
reconstruction eiTor is related to d{ii.^^ , H m +t) and the angle 
between the line from x* to x*^ and the affine space Hm+t defined 
by the new measurements. 





sample mean 




mean estimate 




— bound on mean 







" \ 


sample std 




bound on std 


1 
1 

1 

1 




1 




% 









by these new measurements. The reconstruction error is 
then equal to an unknown constant times this distance: 



(4) 



where Hj^j+t — {x \ yi — a'^x, 1 < i < AI + T} is the 
affine space determined by all M + T measurements, 
Ct is a random variable that we will bound, and 
(i(x*^, Hm+t) denotes the distance from x*^ to Hm+t- 
We characterize E[Ct] and Var[CT] - this gives us 
a confidence interval on the reconstruction error using 
the observed distance d{i.^'^ , Hm+t)- We can now stop 
taking new measurements once the error falls below a 
desired tolerance. Note that our analysis does not assume 
a model of decay, and bounds the reconstruction error by 
obtaining a small number of additional measurements, 
and computing the prediction error In contrast, some 
related results in CS literature assume a power-law 
decay of entries of x* (upon sorting) and show that 
with roughly 0{K log N) samples, x*^ in ([T]i will have 
similar error to that of keeping the K largest entries in 

X* im. 

We now outline the analysis leading to a bound based 
on dUl. Consider Figure |4| Let Hm = {x : A^^:x. = 
Yi-.m} be the subspace of feasible solutions after M 
measurements. Both x* and x^^ lie in H]\j. The affine 
space H]\j+T is contained in Hm- Let L — N ~ M, 
and Ot be the angle between the vector x^^ — x* and 
the affine space Hm+t- Both are contained in the L- 
dimensional space Hm- Centering around x*, we see 
that Ot is the angle between a fixed vector in i?^ and 
a random L — T dimensional subspace of i?^, and the 
constant Ct in © is equal to 



sin(6'T) 
M jj 



M+T 



sm 



(5) 



We next analyze the distribution of 9t and hence of Ct- 
In distribution, 9t is equivalent to the angle between a 



Fig. 5. (Top) sample mean, estimate of the mean, and a bound on 
the mean of Ct- (Bottom) sample standard deviation, and a bound on 
the standard deviation of Ct- Sample mean is based on 1000 samples. 
L = 100. 



fixed L — T dimensional subspace, say the one spanned 
by the last L — T coordinates, and an i.i.d. Gaussian 
vector (whose direction falls uniformly on a unit sphere 
in M^). This holds because the distribution of an i.i.d. 
Gaussian sample does not get changed after applying an 
arbitrary orthogonal transformation. Let H be the span 
of the last L — T coordinate vectors, and h be i.i.d. 
Gaussian. Then: 



Ct — 



L 



sin(.)-^^^^^ 



T 

E 

i=l 



hi 



(6) 



Using the properties of xl, Xl^ and inverse-^/, distri- 
butions II20I and Jensen's inequality, we have an estimate 

of the mean i?[(7T] ~ and an upper bound on both 
the mean and the variance: 



E [Ct] < 
L 



Var [Ct] < 



T-2 



L-2 
T-2' 
2 L 

~~ T' 



(7) 



(8) 



We describe the analysis in Appendix |A] Using these 
bounds in conjunction with the Chebyshev inequalit}0, 
p(|a — £'[a]| > k(Ta) < -p-, we have the following result: 

Proposition 4: In the Gaussian measurement 
ensemble we have: ||x* — x^^||2 < C*^ d{i^^ , Hm+t) 

'To improve upon Chebyshev bounds we could directly characterize 
the cumulative density function of Ct - either analytically, or by 
simple Monte Carlo estimates. 
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Fig. 6. (Top) EiTor confidence bounds and actual errors for a sparse 
signal, N = 100, T = 5, = 10. (Bottom): Error confidence 
bound and actual errors for a signal with power-law decay, N = 1000, 
T = 10. 



with probability at least 1 — -p-, where 

In Figure |5] (top) we plot the mean estimate, and our 
bound in ^ for Ct and (bottom) the standard deviation 
bound for L = 100 and a range of T. We compare them 
to sample mean and standard deviation of Ct based 
on 5000 samples. The figure shows that both bounds 
provide very good approximation for most of the range 
of T > 2, and also that the standard deviation quickly 
falls off with T, giving tight confidence intervals. In 
Figure |6] we perform numerical experiments with two 
example signals, a sparse signal, N — 100, K = 10, 
T = 5 (top) and a near-sparse signal with power-law 
decay, N = 1000, T = 10 (bottom). We use basis pursuit 
to recover the signals as we obtain progressively more 
measurements, and we compare our error bounds (via 
Chebyshev inequality) to the actual errors. We see that 
the bounds reliably indicate the reconstruction error - 
after a small delay of T additional measurements. We 
have used basis pursuit in the experiments, but we could 
substitute any sparse solver instead, for example we 
could have also computed error estimates for matching 
pursuit. 



A. Analysis for More General Ensembles 

To get the bound in (|4]l we characterized the distri- 
bution of ^.^^^g^^ and used the properties of the Gaus- 
sian measurement ensemble. Analysis of 9t for general 
ensembles is challenging. We now consider a simpler 
analysis which provides useful estimates when T << L, 
i.e. the case of main interest for compressed sensing, 
and when the measurement coefficients aij are from 
an i.i.d. zero-mean ensemble. The previous bound for 
the Gaussian case depended on both AI, the number of 
samples used for the current reconstruction, and T, the 
number of extra samples. Now, in the following we give 
estimates and bounds that depend only on T, and in 
that sense could be weaker for the Gaussian case when 
M is large; they are however more generally applicable 
- in particular we no longer require x^^ to satisfy the 
measurements exactly. 

Suppose we have a current reconstruction ic, and 
suppose X* is the (unknown) true signal. We now take T 
new samples yi — a^x*, for I < i < T. For each of these 
samples we compute = a^x*^ to be the same vector 
applied to the current reconstruction. Denote the current 
error vector hy S = x*^ — x*, and compute z,; — iji — Ui, 
the deviations from the actual measurements. Then 

z, = a'^S, l<i<T (9) 

The new measurements a^ are independent of x and 
of X*, hence of d. The z,'s are i.i.d. from some (un- 
known) distribution, which has zero mean and variance 
ll^lll Var{aij). We can estimate ||<5||2 by estimating the 
variance of the z^'s from the T samples. The quality of 
the estimate will depend on the exact distribution of a^ . 

Consider the case where a^ are i.i.d. Gaussian. Then 
Zi is Gaussian as well. For simplicity suppose that 
Var{aij) = 1, then the distribution of Zi is i.i.d. 
Gaussian with zero-mean and variance ||'^||2" Let Zj^ — 
E'ill+i Then Zt ^ xh i-^- random 

variable with T degrees or freedom. Now to obtain a 
confidence interval for \\S\\2 we use the cumulative Xt 
distribution. We pick a confidence level 1 — a (for some 
small a > 0), and we use the Xt cumulative distribution 
to find the largest z* such that p{Zt < z*) < a0 

During the review process a related analysis in ifTSll 
was brought to our attention: the paper considers com- 
pressed sensing in a cross-validation scenario, and it 
proposes to estimate the errors in the reconstruction 
from a few additional (cross-validation) measurements. 

* We have that ct^ = ^& gives the smallest value of such that 
probabiHty of observing is at least a. That is to say, the bound 
il^lP < will hold for at least 1 — a fraction of realizations of 
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Fig. 7. Compai'ison of and sin(6) analysis. Given a unit-norm 
vector S, we obtain T additional measurements, and compute our two 
estimates of \\S\\. We plot the histogram of the estimates over 5000 
trials with N = 250, T = 25, and (a) M = 0, (b) M = 200. 



The paper cleverly uses the Johnson-Lindenstrauss (JL) 
lemma to find out how many random measurements are 
needed for predicting the error to a desired accuracy. For 
Gaussian measurements ensembles our x^-based analy- 
sis can be seen as a special case (where all the constants 
are computed explicitly since we use the exact sampling 
distribution of Zt), but JL lemma also generalizes to 
other ensembles satisfying certain requirements on the 
decay of the tails |15], IITI . 

To compare our analysis in (|5]), based on Ct, to 
the one in (|9]l we note that the latter simply estimates 
the error \\6\\ as where A are the new mea- 

surement^. Now unlike the analysis in in (|5]l we 
require that the solution at step M is feasible (matches 
all the measurements) and instead we compute the error 
of projecting S onto the null-space of A and adjust it 
by the expected value of -^^^f^, i.e. we estimate \\6\\ 

as ^J^\\A'{AA')-^AS\\, where A includes all M + T 
measurements. To compare the quality of the two esti- 
mates we conducted a simulation with N — 250 and 
T — 25, and computed the estimates for random unit- 
norm vectors d. We plot the histograms for M = and 
M — 200 over 5000 trials in Figure I?] In the first case 
with M = 0, we see that both estimates have about the 
same accuracy (similar error distributions), however as 
AI becomes appreciable the approach in (|5]l becomes 
more accurate. 



'This is essentially the same estimate as the one based on JL lemma 
in 1151 , as the expected value of Xt is T, hence E[Zx\ = 



VI. Noisy case 

Next we consider the sequential version of the noisy 
measurement setting, where the observations are cor- 
rupted by additive uncorrected i.i.d. Gaussian noise with 
variance cr^^: 

= a^x + n,, ie{l,..,M}. (10) 

To solve this problem one can adapt a variety of sparse 
solvers which allow inexact solutions ic*^ in the se- 
quential setting - for example matching pursuit methods 
with a fixed number of steps, or the noisy versions 
of basis pursuit. All of these methods have a trade-off 
between sparsity of the desired solution and the accuracy 
in representing the measurements. In the case of basis 
pursuit denoising a regularization parameter A balances 
these two costs: 

x*^ -argmini||yi:M-A*^x||2 + AM||x||i. (11) 

For greedy sparse solvers such as matching pursuit and 
its variants the trade-off is controlled directly by deciding 
how many columns of A to use to represent y. We are 
interested in a stopping rule which tells us that x is 
reasonably close to x* for any sparse solver and for any 
user defined choice of the trade-off between sparsity and 
measurement likelihood. We do not discuss the question 
of selecting a choice for the trade-off - we refer the 
readers to [22], fST] and also to flSl for a discussion 
of how this can be done in a cross-validation setting. 
Now, due to the presence of noise, exact agreement will 
not occur no matter how many samples are taken. We 
consider a stopping rule similar to the one in Section IV] 
In principle, the analysis in (|4|i can be extended to the 
noisy case, but we instead follow the simplified analysis 
in Section IV-AI 

We establish that the reconstruction error can be 
bounded with high probability by obtaining a small 
number of additional samples, and seeing how far the 
measurements deviate from iji = ajx*^. With such a 
bound one can stop receiving additional measurements 
once the change in the solution reaches levels that can 
be explained due to noise. The deviations Zi now include 
contribution due to noise: 

2i = ?/i - 2/i = a-(x*^ - X*) - n^. (12) 

Let Zt = J^^f- Consider the Gaussian measurement 
ensemble. Then Zi = a^^+n^, and Zt — ~ Xt- 

The distribution of Zi is Gaussian with mean zero and 
variance ||^||2 + o'n. Now following a similar analysis as 
in previous section we can obtain an estimate of ||5||2 + 
cr^j from a sample of Zt, and subtracting af^ we get an 
estimate of ||^||2. 
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Fig. 8. Error estimate in the noisy case: true eiTor and a 90-percent 
confidence bound (dB scale): N = 1000, T = 10, K = 100. 



We show an example in Figure |8] where the true error 
appears along with a 90-percent confidence bound. We 
have N = 1000, K = 100, T = 10 and cr„ = 0.01. We 
use basis pursuit denoising ( fT2] l as our choice for sparse 
solver, and we set Xi\i oc y/ M log(iV) motivated by the 
universal rule for wavelet denoising |22| to account for 
noise added with additional measurements. The bound 
clearly shows where the sparse signal has been recovered 
up to the noise floor (the signal is sparse with K = 100 
non-zero elements). 



VII. Efficient sequential solution 

The main motivation for the sequential approach is to 
reduce the number of measurements to as few as possi- 
ble. Yet, we would also like to keep the computational 
complexity of the sequential approach low. We focus 
on the £i-based formulations here, and show that there 
is some potential of using "memory" in the sequential 
setting for reducing the computational complexity. For 
the static setting there exists a great variety of approaches 
to solve both noiseless and noisy basis pursuit (i.e. 
basis pursuit denoising) in various forms, e.g. Il23l - 
ll25l . However, instead of re-solving the linear program 
(HJ after each new sample, we would like to use the 
solution to the previous problem to guide the current 
problem. It is known that interior point methods are 
not well-suited to take advantage of such "warm-starts" 
ll23l . Some methods are able to use warm-starts in 
the context of following the solution path in (fTTl i as 
a function of A 1231, |l26|, ||27|. In that context the 
solution path x(A) is continuous (nearby values of A give 
nearby solutions) enabling warm-starts. However, once 
a new measurement is received, this in general makes 
the previous solution infeasible, and can dramatically 
change the optimal solution, making warm-starts more 



challenging^- 

We now investigate a linear programming approach 
for warm-starts using the simplex method to accomplish 
this in the noiseless case (a similar strategy can be used 
with the Dantzig decoder |T| for the noisy case). We can 
not use the solution x^^ directly as a starting point for 
the new problem at step M + 1, because in general it 
will not be feasible. In the Gaussian measurement case, 
unless x*^ = x*, the new constraint a^^_(_-^x*^ = ijm+i 
will be violated. One way to handle this is through a dual 
formulatior03, but we instead use an augmented primal 
formulation |,29i . 

First, to model ([U as a linear program we use the stan- 
dard trick: define = ma.x{xi, 0), ~ max(— Xj, 0), 
and X = x"*" — x^. This gives a hnear program in 
standard form: 



min I'x 



I'x- 



and x+ , X 



(13) 

> 



Next we need to add an extra constraint i/m+i = 
a^^_^ix+ - a^^^^x". Suppose that a'^^^;^x^^ > yu+i- 
We add an extra slack variable z to the linear program, 
and a high positive cost Q on z. This gives the following 
linear program: 



yi:M 



VM+l 



■ -I ' + 
imn 1 X 



I'x- + Qz (14) 
and x"*" , x~ > 
- z, and z > 



Now using X 



M 



and z — 



a 



M- 



a^j_i_]^(x*^)^ — Um+i yields a basic feasible solution to 
this augmented problem. By selecting Q large enoughOl 
z will be removed from the optimal basis (i.e. z is set to 
0), and the solutions to this problem and the {M + l)-th 
sequential problem are the same. 

We test the approach on an example with N = 200, 
K — 10, and 100 trials. In Figure |9] we plot the number 
of iterations of the simplex method required to solve the 
problem ([TJ at step M from scratch (LPl) and using 
the formulation in (fT4l l (LP2). To solve (fTSl l we first 
have to find a basic feasible solution, BFS, (phase 1) 
and then move from it to the optimal BFS. An important 
advantage of (fl4l l is that we start right away with a BFS, 

'"in related work, |28| proposed to use Row-action methods for 
compressed sensing, which rely on a quadratic programming for- 
mulation equivalent to (T) and can take advantage of sequential 
measurements. 

"if at step AI the optimal dual solution is p, then a feasible solution 
at step _A/+1 is [p; 0]. However, it may not be a basic feasible solution. 

'^E.g. the big-Af approach |29| suggests treating Q as an undeter- 
mined value, and assumes that Q dominates when compared to any 
other value. 
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Fig. 9. A comparison of tlie number of simplex iterations when 
solving (T) from scratch (LPl) and using the solution at step M — 1 
(LP2). We plot the average number of iterations vs. M, over 100 trials. 

SO phase 1 is not required. The figure illustrates that for 
large M the approach LP2 is significantly faster. 

We note that recently a very appealing approach 
for sequential solution in the noisy setting has been 
proposed based on the homotopy continuation idea Il30l . 
|[31|, where a homotopy (a continuous transition) is 
constructed from the problem at step M to the problem 
at step M + 1 and the piecewise-smooth path is followed. 
The efficiency of the approach depends on the number 
of break-points in this piecewise-smooth path, but the 
simulations results in the papers are very promising. We 
also note that [30] proposes an approach to select the 
trade-off in the noisy case, using cross-validation ideas. 

VIII. Conclusion and discussion 

This paper presents a formulation for compressed 
sensing in which the decoder receives samples sequen- 
tially, and can perform computations in between samples. 
We showed how the decoder can estimate the error in 
the current reconstruction; this enables stopping once the 
error is within a required tolerance. Our results hold for 
any decoding algorithm, since they only depend on the 
distribution of the measurement vectors. This enables 
"run-time" performance guarantees in situations where a- 
priori guarantees may not be available, e.g. if the sparsity 
level of the signal is not known, or for recovery methods 
for which such guarantees have not been established. 

We have studied a number of scenarios including 
noiseless, noisy, sparse and near sparse, and involving 
Gaussian and Bernoulli measurements, and demonstrated 
that the sequential approach is practical, flexible and 
has wide applicability. A very interesting problem is to 
both extend the results to other measurement ensembles, 
e.g. for sparse ensembles, and moreover, to go beyond 
results for particular ensembles and develop a general 
theory of sequential compressed sensing. Furthermore, in 
many important applications the sparse signal of interest 
may also be evolving with time during the measurement 
process. Sequential CS with a notion of 'time of a 
measurement' is a natural candidate setting in which to 
explore this important extension to the CS literature. 



We also remark that there is a closely related problem 
of recovering low-rank matrices from a small number 
of random measurements [32J, [33 1, where instead of 
searching for sparse signals one looks for matrices with 
low-rank. This problem admits a convex 'nuclear-norm' 
relaxation (much akin to £i relaxation of sparsity). Some 
of our results can be directly extended to this setting 
- for example if in the Gaussian measurement case 
with no noise there is one-step agreement, then the 
recovered low-rank matrix is the true low-rank solution 
with probability one. 

Finally we comment on an important question ||6l, ll34l 
of whether it is possible to do better than simply using 
random measurements - using e.g. experiment design or 
active learning techniques. In [6J the authors propose 
to find a multivariate Gaussian approximation to the 
posterior p(x |y) where p(y | x) oc exp(45-||y — Ax|p), 
andp(x) oc exp(— A||x|| i). Note that MAP estimation in 
this model x = argmaxxp(x | y) is equivalent to the 
formulation in (fTTT l. but does not provide uncertainties. 
Using the Bayesian formalism it is possible to do ex- 
periment design, i.e. to select the next measurement to 
maximally reduce the expected uncertainty. This is a very 
exciting development, and although much more complex 
than the sequential approach presented here, may reduce 
the number of required samples even further. 



Appendix A 
Derivation of the distribution for 



|h||i] 



Since 



Consider Elain^ie)] = eUJ^I 
Si ^[pit?] ^ Q^ch hi is i.i.d., we have 

E[^] = \. In fact E[^] follows a Dirichlet 
distribution. Therefore, i?[sin^(6')] — ^. 

Using Jensen's inequality with the convex function 
^JTJx, x > 0, we have E[l/s\\i{e)] > y^. 

Now, E[ ' 
because E 



^j^] = ^ (for T > 2). This is true 



1 



E (Etr+i hVY.t=^ h^) ^ 1 + {L- T)^. 
The second term is a product of a random variable 
with (L — T) degrees of freedom and an independent 
inverse-x'^ distribution with T degrees of freedom: 



E[Et 



=T+1 



hj] = L- T, and E[ 



1 

T-2' 



201. Now l + (L-T)/(T-2) = (i-2)/(r-2) 



Finally, using Jensen's inequality with the concave 



function Vi, E[^] < x/H- 
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